Meta-Analyses of Splicing and Expression Quantitative Trait Loci Identified Susceptibility Genes of Glioma

Background The functions of most glioma risk alleles are unknown. Very few studies had evaluated expression quantitative trait loci (eQTL), and insights of susceptibility genes were limited due to scarcity of available brain tissues. Moreover, no prior study had examined the effect of glioma risk alleles on alternative RNA splicing. Objective This study explored splicing quantitative trait loci (sQTL) as molecular QTL and improved the power of QTL mapping through meta-analyses of both cis eQTL and sQTL. Methods We first evaluated eQTLs and sQTLs of the CommonMind Consortium (CMC) and Genotype-Tissue Expression Project (GTEx) using genotyping, or whole-genome sequencing and RNA-seq data. Alternative splicing events were characterized using an annotation-free method that detected intron excision events. Then, we conducted meta-analyses by pooling the eQTL and sQTL results of CMC and GTEx using the inverse variance-weighted model. Afterward, we integrated QTL meta-analysis results (Q < 0.05) with the Glioma International Case Control Study (GICC) GWAS meta-analysis (case:12,496, control:18,190), using a summary statistics-based mendelian randomization (SMR) method. Results Between CMC and GTEx, we combined the QTL data of 354 unique individuals of European ancestry. SMR analyses revealed 15 eQTLs in 11 loci and 32 sQTLs in 9 loci relevant to glioma risk. Two loci only harbored sQTLs (1q44 and 16p13.3). In seven loci, both eQTL and sQTL coexisted (2q33.3, 7p11.2, 11q23.3 15q24.2, 16p12.1, 20q13.33, and 22q13.1), but the target genes were different for five of these seven loci. Three eQTL loci (9p21.3, 20q13.33, and 22q13.1) and 4 sQTL loci (11q23.3, 16p13.3, 16q12.1, and 20q13.33) harbored multiple target genes. Eight target genes of sQTLs (C2orf80, SEC61G, TMEM25, PHLDB1, RP11-161M6.2, HEATR3, RTEL1-TNFRSF6B, and LIME1) had multiple alternatively spliced transcripts. Conclusion Our study revealed that the regulation of transcriptome by glioma risk alleles is complex, with the potential for eQTL and sQTL jointly affecting gliomagenesis in risk loci. QTLs of many loci involved multiple target genes, some of which were specific to alternative splicing. Therefore, quantitative trait loci that evaluate only total gene expression will miss many important target genes.


BACKGROUND
Gliomas are among one of the most devastating of rare cancers and are ranked first among all cancers in terms of average years of life lost (Rouse et al., 2016). The only environmental risk factor consistently identified is ionizing radiation (Ostrom et al., 2015). In the past decade, a number of genome-wide association studies (GWAS) and a meta-analysis of GWAS validated 25 risk alleles for glioma (Shete et al., 2009;Wrensch et al., 2009;Sanson et al., 2011;Stacey et al., 2011;Jenkins et al., 2012;Rajaraman et al., 2012;Enciso-Mora et al., 2013;Walsh et al., 2014;Kinnersley et al., 2015;Melin et al., 2017). The molecular mechanism of glioma risks conferred by most of these variants is unknown. A method to discover target genes of risk SNPs is molecular quantitative trait loci (molQTL) mapping, using molecular and singlenucleotide polymorphism (SNP) data from relevant cells and tissues for that trait. Although the ascertainment of relevant tissues for some traits had yielded surprising results, heritability enrichment analyses confirmed non-diseased brain tissues as the most relevant for diseases of the brain Hormozdiari et al., 2018).
Previous analyses integrating expression quantitative trait loci QTL (eQTL) data with glioma GWAS have provided limited insights to date. One study used glioma datasets from The Cancer Genome Atlas (TCGA) and lymphoblastoid cell line data from Genetic EUropean VAriation in DISease (GEUVADIS) (Lappalainen et al., 2013;Kinnersley et al., 2015); two others utilized brain tissue data of Genotype-Tissue Expression Project (GTEx), and one also used a blood eQTL dataset (Westra et al., 2013;Melin et al., 2017;Wu et al., 2019). Among the three studies, significant target genes were identified in three loci using GTEx brain tissues but none using glioma tissues from TCGA (Kinnersley et al., 2015;Melin et al., 2017). Two more loci harbored significant eQTL using genotyping and expression data from whole blood, but significance of the results was unclear because lymphocytes may not share heritability with central nervous system (CNS) cells. Therefore, there is a need of a larger QTL study or meta-analysis of QTLs based upon nondiseased brain tissues.
Moreover, the scope of eQTL analyses published to date was limited, because analyses often involved a limited number of correlated regulatory SNPs in each locus and cis gene. Recent functional assays suggested that few of the top GWAS risk alleles are themselves functional or causal, and many are in fact in linkage disequilibrium (LD) with one or several functional SNPs in a given locus (Biancolella et al., 2014;Fortini et al., 2014;Lawrenson et al., 2016;Buckley et al., 2019). Moreover, studies have found that target genes may not be the nearest gene to a risk SNP (Buxton et al., 2019;Farashi et al., 2019). Thus, there is a need for a more comprehensive QTL evaluation.
Approximately 48% of GWAS loci harbored eQTLs (Joehanes et al., 2017). For those without eQTLs, the effect of risk alleles may be mediated through alternative transcript splicing rather than total gene expression. In fact, RNA splicing is the most abundant within the brain (Yeo et al., 2004;Vuong et al., 2016). It plays an important role in normal function and development of the central nervous system (Vuong et al., 2016). Numerous studies have demonstrated that de novo germline mutations and germline variants contribute to the risk of neurological diseases by affecting alternative splicing (Xiong et al., 2015;Reble et al., 2018). Recently, two independent genome-wide splicing QTL (sQTL) mappings identified 8,966 and 9,028 sQTLs in the non-diseased human brain involving > 3,000 genes, respectively, supporting the idea that genetic variants commonly regulate gene transcription in the brain via the formation of alternatively spliced transcripts (Takata et al., 2017;Raj et al., 2018). Moreover, recent sQTL evaluations have contributed additional target genes of cancer risk variants (Gusev et al., 2019;Guo et al., 2020). Therefore, adding sQTL to eQTL analysis may lead to the discovery of alternative functional mechanisms not explained by eQTL study alone.
Here, we sought to evaluate sQTLs in validated glioma risk loci and to perform comprehensive eQTL and sQTL meta-analyses. To our knowledge, no prior study evaluated alternative spliced genes in glioma susceptibility. For this purpose, we used a validated, annotation-free quantification of the RNA splicing method to identify variable splicing events from short-read RNA-seq data. Moreover, to conduct both eQTL and sQTL meta-analyses, we pooled genotyping or whole-genome sequencing and RNA-seq data from the CommonMind Consortium (CMC), which is the largest resource of postmortem brain tissues in the United States (US), and GTEx (multiple brain tissues) for a combined sample size of 354 unique non-diseased individuals' brain tissues (European ancestry) (Fromer et al., 2016;GTEx Consortium et al., 2017). In order to ensure that the same variants are likely responsible for the signals in both GWAS and QTLs, significant results from QTL meta-analyses were integrated with the Glioma International Case-Control Study (GICC) GWAS metaanalysis (Melin et al., 2017), using a summary data-based mendelian randomization method (SMR). Furthermore, we evaluated functional enrichment of significant SNPs identified through meta-analyses and SMR and further annotated SMRassociated SNPs with publicly available ChIP-seq and RNAbinding protein datasets.

Study Datasets
We used SNP genotyping and RNA-seq data from the CommonMind Consortium (CMC release 1.0) and Genotype-Tissue Expression Program (GTEx version 7.p2) for this meta-analysis (Fromer et al., 2016;GTEx Consortium et al., 2017). Data generated for CMC came from postmortem human brain specimens originating from the tissue collections at three brain banks: Mount Sinai NIH Brain Bank and Tissue Repository, The University of Pittsburgh Brain Tissue Donation Program, and the University of Pennsylvania Brain Bank of Psychiatric Illness and Alzheimer's Disease Core Center (Fromer et al., 2016). We used only RNA-seq and genotyping data from 279 unique individuals (dorsal lateral prefrontal cortex), who did not have neuropsychiatric diseases or neurological insults immediately before death. Approval was obtained from the National Institute of Mental Health (NIMH) repository and genomic resources. The GTEx project was established to characterize human transcriptomes within and across individuals for a wide variety of primary tissues and cell types (GTEx Consortium et al., 2017). Since > 90% of glioma arise from the supratentorial compartment of the brain (Larjavaara et al., 2007), we chose the RNA-seq and SNP data of eight supratentorial non-diseased brain tissues (anterior cingulate cortex, caudate nucleus, cortex, frontal cortex, hippocampus, hypothalamus, nucleus accumbens, and putamen) from 190 unique individuals for our analyses. Approval was obtained from dbGAP.

Genotyping and Whole-Genome Sequencing Data
CommonMind Consortium used the Infinium HumanOmniExpressExome v1.1 DNA Analysis Kit (Illumina, 958,178 SNPs) for genotyping. Following exclusion of those SNPs with genotyping call rate < 0.95, Hardy-Weinberg P-value < 5 × 10 −5 , and those without alternate alleles, we used Admixture v1.3.0 to ascertain ancestry and retained the samples of 216 unique individuals of European ancestry (Supplementary Table 1). We then performed imputation using the Sanger Imputation Server, with the UK10K and 1000 Genomes Phase 3 dataset as reference panels (1000Genomes Project Consortium et al., 2015UK10K Consortium et al., 2015). Those imputed SNPs with info score ≥ 0.5 were kept.
For the GTEx Consortium data (v7 release), we extracted SNPs from the genotype cell VCF file of whole-genome sequencing data. There were 10,526,813 SNPs at MAF ≥ 0.01. Quality control (QC) processes and admixture ascertainment were the same as those of the CMC datasets. Following QC and admixture analysis, we retained the genotyping data of 138 unique individuals of European ancestry for all subsequent QTL analyses (Supplementary Table 1).
To compile a complete list of candidate regulatory SNPs (or candidate functional SNPs) within the 25 glioma loci for eQTL and sQTL mapping, we empirically defined the SNP regions of interest as those localized ± 1.1 Mb from the top GWAS meta-analysis SNPs; candidate SNPs must have an r 2 ≥ 0.2 with those top SNPs. The distance of 1.1 Mb was chosen because prior study had identified the 99th percentile of distance between SNP and target gene to be approximately 1.1 Mb (Thibodeau et al., 2015). Altogether, we extracted 4074 SNPs among the 25 loci. Moreover, we retrieved an additional 394 SNPs with P-values ≤ 1 × 10 −8 in the fine mapping analysis of the GICC GWAS meta-analysis that have not yet been included among the 4074 SNPs (Melin et al., 2017). Some of these GWAS SNPs had r 2 < 0.2 with the top GWAS SNPs. Therefore, the final number of candidate SNPs for both QTL mapping was a total of 4468 (Supplementary Figure 1).

RNA-Seq Data
CommonMind Consortium (release 1) RNA-seq data (N = 216) were available as BAM files, which contained 100-bp pairedend reads and ≥50 million total reads per sample (Fromer et al., 2016). Sequencing was performed by using HiSeq2500 (Illumina). Sequencing libraries were prepared using rRNA depletion procedures. Downloaded BAM files for mapped and unmapped reads were merged by using SAMtools. Merged BAM files were converted into the FASTQ format using bam2fastq function within SAMtools.
We downloaded GTEx (v7) RNA-seq data from dbGAP as SRA files and converted them to FASTQ files using the SRA toolkit. Sequencing was performed using HiSeq 2500, which generated 76-bp paired-end reads and ≥50 million reads. We used a total of 670 FASTQ files from 138 unique individuals.
We processed FASTQ files from both CMC and GTEx using the same pipeline. We mapped reads from FASTQ files to hg19 using STAR v2.6 (two pass mode), with GENCODE V19 as the reference annotation (Dobin et al., 2013). FeatureCounts was used to generate the gene-level counts from the aligned reads (Liao et al., 2014). Quality control measures consisted of the criteria that all genes had >5 reads across all samples and <10 samples per gene had zero reads. Among the 25 loci (located ± 1.1 Mb from top GWAS SNPs), a total of 1559 met the criteria and were retained in both datasets for eQTL mapping. After quality control measures, we normalized and variance stabilized gene counts using DESeq2 (Love et al., 2014).

Alternative RNA Splicing Quantification
To prepare for sQTL mapping, alternative transcripts were first evaluated using LeafCutter, which is an established method that used short-read RNA-seq data to detect intron excision events at base-pair precision by analyzing mapped split reads . LeafCutter's intron-centric view of splicing is based on the observation that mRNA splicing occurs predominantly through the step-wise removal of introns from nascent pre-mRNA. The advantage of this method is that it does not require read assembly or inference of isoforms supported by ambiguous reads, both of which are computationally and statistically difficult. The detailed method has been discussed elsewhere (Raj et al., 2018). Briefly, LeafCutter pools all mapped reads, finds overlapping introns demarcated by split reads, and then constructs a graph that connects all overlapping introns sharing a donor or acceptor splice site. The connected components of the graph form intron clusters, which represented alternative intron excision events. LeafCutter iteratively applies a filtering step to remove rarely used introns, which are defined on the basis of the proportion of reads supporting a given intron compared with other introns in the same cluster, and re-clusters leftover introns. Across the 25 validated loci, there were a total of 518 target genes with 6326 alternatively spliced intron clusters that formed the basis of sQTL mapping.

eQTL Mapping
We performed eQTL analyses with covariates including age, sex, RIN scores, top three genotyping principal components, and 20 Probabilistic Estimation of Expression Residuals (PEER) factors, which were calculated from the normalized RNA expression matrices (Stegle et al., 2012). We used Matrix eQTL (v2.1.0) and multilevel linear regression with random intercept (lme4 package in R v3.5) for eQTL analysis of the CMC and GTEx datasets, respectively (Shabalin, 2012). Matrix eQTL does not accommodate multiple correlated brain tissue expression data from the same individuals; therefore, multilevel linear regression was used for the analysis of GTEx eQTL data. We adjusted for false discovery rate using the Benjamin-Hochberg method (FDR Q < 0.05).

sQTL Mapping
We focused on the SNP-intron cluster that was within the ±100-kb window, as prior studies had reported that sQTL are mostly concentrated within this genomic distance (Takata et al., 2017;Raj et al., 2018). For sQTL identification, LeafCutter found alternatively excised intron clusters and quantified intron excision levels in all samples . It then outputted intron excision proportions, which was the number of reads supporting a specific intron excision event to the total number of reads from that intron cluster. The ratios were then quantile normalized and used as input for Matrix eQTL and multilevel linear regression in sQTL analyses of CMC and GTEx, respectively. Covariates used were the same as eQTL mapping. To adjust for false discovery rates, the number of intron excision events in a cluster was first adjusted by Bonferroni correction, and subsequently FDR correction (FDR Q < 0.05) was applied for the number of sQTL per intron cluster. This 2-step multiplecomparison adjustment method was recommended for sQTL mapping with Leafcutter .

eQTL and sQTL Shared Between CMC and GTEx
To evaluate the sharing of eQTL and sQTLs between the two datasets, we used Storey's QVALUE software implemented in R package (Storey and Tibshirani, 2003). The program takes a list of p-values and computes their estimated π 0 , which is the proportion of features that are truly null. Then, the quantity π 1 = 1 -π 0 estimates the proportion of true positives (TP). Sharing between two samples was reported as the proportion of TP estimated from the p-value distribution of independent QTLs discovered in the CMC dataset that is also present in the GTEx dataset.

Meta-Analyses of eQTL and sQTL
For both eQTL and sQTL meta-analyses, we used the fixed-effect inverse variance weighted model to combine summary statistics (β and standard errors) of the CMC and GTEx. We used the I 2 statistics to quantify the percentage of variation across studies that is due to heterogeneity rather than chance and is inherently not dependent upon the number of studies considered. We took the I 2 value ≥ 75 to indicate significant heterogeneity. The mean overall I 2 for all eQTL sand sQTLs were 0%, indicating overall low level of heterogeneity. We conducted all meta-analyses using METAFOR (v2.4) in R. eQTL meta-analysis was considered significant at FDR Q < 0.05. For sQTL meta-analysis, the same two-level multipletesting adjustment (described above) was applied.

Integration of QTL Mapping and GWAS Data Using SMR Analyses
We integrated significant eQTL and sQTL meta-analysis results with the GICC GWAS meta-analysis (case:12,496, control:18,190) using summary-data-based mendelian randomization (SMR) analysis (Zhu et al., 2016;Melin et al., 2017). Approval for the GICC GWAS meta-analysis was obtained from the European Genome-phenome Archive (EGA). This SMR method assumed only one causal variant (affecting both probes and a trait) in any given locus; therefore, it tested the association between a trait and probes using the effect size of the top associated cis-QTLs from eQTL and sQTL mappings. A probe for eQTL was each significant target gene and for sQTL the individual intron cluster region. The input for the SMR analyses were significant QTL meta-analysis probes at FDR < 0.05 and GWAS SNPs with associated P-value < 5.0E-05. The 1000 Genome Project phase 3 data was used as the reference sample for LD estimation and allele frequency calculation (1000Genomes Project Consortium et al., 2015. According to the SMR methodology, significantly colocalized QTLs must pass a pSMR threshold that is based upon Bonferroni correction of the total number of probes tested, which equated to the total number of target genes (eQTL) and total number of intron cluster regions (sQTL), as well as a pHEIDI > 0.05 without multiple-testing correction (Zhu et al., 2016). For eQTL, the pSMR threshold was 8.20E-04 for 61 probes, and for sQTL, the threshold was 4.81E-04 for 104 probes.

Conditional Analyses
To find secondary QTL mapping signals, we performed conditional analyses using Genome-wide Complex Trait Analysis (GCTA) in the cis-QTL regions, condition on the top cis-eQTL or sQTL for significant probes found by SMR analyses (Yang et al., 2012). We used our QTL meta-analysis summary statistics data of GICC GWAS meta-analysis SNPs (P < 5.0E −05 ) for this purpose.

Enrichment Analyses of eQTL and sQTL SNPs Within Epigenomic Marks and RNA Protein (RBP)-Binding Sites
We evaluated the enrichment within epigenomic marks and RBPbinding sites of our significant eQTL and sQTL meta-analysis SNPs and SMR-associated SNPs. We carried out the enrichment analyses using GREGOR (Genomic Regulatory Elements and GWAS Overlap Algorithm) v1.3.1 (Schmidt et al., 2015). For epigenomic marks, we retrieved the following publicly available ChIP-seq data: H3K4me1, H3Kme3, H3K27ac, H3K9me3, H3K27me3, H3K36me3, and DHS from Roadmap Epigenome Consortium and NCBI GEO. ChIP-seq datasets of the following cell types were considered: normal astrocytes, GBM stem cells, neural stem cells, H9 cells, H9-derived neuronal progenitor cells, and GBM cell lines (Supplementary Table 2). For RBP sites, we downloaded all sites from cross-linking immunoprecipitation (CLIP)-seq data of 171 human RBP collected within the CLIPdb database and 371 from MotifMapRNA (Yang et al., 2015;Liu et al., 2017). Files were converted to BED format and concatenated together as a single annotation file before analyses.
GREGOR evaluates the significance of the observed overlap by estimating the probability of the observed overlap of our input SNPs relative to expectation using a set of matched control variants. The control variants are random control SNPs selected across the genome that match the input SNPs based upon the number of variants in LD, minor allele frequency, and distance to nearest genes or intron clusters. The P-value calculated by GREGOR assumed a sum of binomial distributions to represent the number of index SNPs that overlap a dataset compared to the expectation observed in the matched control sets. In addition, we adjusted the P-value using FDR (significance is Q < 0.05) due to multiple testing.

Functional Annotation and Visualization of SMR-Associated SNPs
We used SnpEff v4.3 to classify genomic positions of all SMRassociated SNPs (Cingolani et al., 2012). We further annotated these SNPs using the same set of publicly available ChIP-seq data as mentioned above for enrichment analyses. ChIP-seq data in BigWig files were aligned with LocusZoom plots in the UCSC genome browser to illustrate overlaps of SMR-associated SNPs within ChIP-seq peaks.
To evaluate the overlap of RNA-binding proteins (RBP) at genomic locations of SMR-associated SNPs for sQTL, we searched RBP-binding sites presented within CLIPdb and MotifMapRNA, as mentioned above (Yang et al., 2015;Liu et al., 2017). Furthermore, we used sQTLviztools (implemented in R) for visualization of sQTL (Tapial et al., 2017). Since LeafCutter does not provide a companion program for alternative transcript annotations, we overlapped the genomic coordinates of intron cluster regions with known alternatively spliced regions downloaded from VASTdb using the tool intersectBed within the BEDtools suite. Then, we identified known annotations using VAST tools (Tapial et al., 2017).

RESULTS eQTLs and sQTL Results of CMC and GTEx Datasets
To characterize the effect of glioma risk variants on gene regulatory processes in the brain, we performed a large-scale eQTL and sQTL scan in CMC and GTEx (Supplementary Figure 1). We tested a total of 4342 unique SNPs and 1559 target genes across 25 loci in eQTL, and 4342 unique SNPs, 518 spliced genes, and 6326 spliced intron cluster regions in sQTL analyses (Supplementary Figure 1). CMC and GTEx shared 1,859 significant eQTLs and 4,141 significant sQTLs (both at FDR Q < 0.05; Supplementary Tables 3, 4). We then estimated the degree of sharing of eQTLs and sQTLs between CMC and GTEx, using Storey's π1 statistic (see section "Materials and Methods"). We found π1 = 0.74 for eQTLs and π1 = 0.84 for sQTLs, which suggested substantial sharing of eQTLs and sQTLs between these two independent datasets. The effect size was also highly correlated between the two datasets for shared eQTLs (Pearson r = 0.62, P = 3.24 E −195 ) and sQTLs (Pearson r = 0.91, P < 2.2 E −16 ). Taken together, these findings demonstrated that there was substantial concordance between significant sQTLs and eQTLs identified in CMC and GTEx.

Summary Data-Based Mendelian Randomization Analysis (SMR)
Following meta-analyses, we used SMR analysis which integrated summary-level data from GICC GWAS meta-analysis with significant eQTL and sQTL meta-analyses (FDR Q < 0.05; see section "Materials and Methods"). SMR revealed 15 eQTLs in 11 loci and 32 sQTLs in 9 loci that exceeded the predefined p-value threshold of the SMR test and passed the pHEIDI test (Tables 1,  2 and Supplementary Tables 7, 8). Therefore, the target genes and spliced genes in Tables 1, 2 are associated with glioma due to pleiotropy and are the most functionally relevant.
We then performed conditional association analyses to evaluate the possibility of secondary QTL signals. Using Genomewide Complex Trait Analysis (GCTA), condition on the top cis-eQTL or sQTL for significant probes identified through SMR analyses (those in Tables 1, 2), we did not find any secondary QTL signals.

Enrichment Analyses of Meta-Analyses SNPs and SMR-Associated SNPs
We then evaluated whether significant QTL meta-analysis SNPs and SMR-associated SNPs were enriched with regulatory elements. Among meta-analysis SNPs, enrichment tests were significant (FDR Q values < 0.05) for DNase 1 hypersensitivity site (DHS), H3K36me3, H3K4me1, H3K4me3, H3K27ac, and H3K9me3 but not H3K27me3 for eQTL, whereas enrichment analyses were only positive for DHS and H3K36me3 for sQTL (Supplementary Table 9). In SMR-associated SNPs, only DHS enrichment was significant for eQTL and borderline significant for sQTL (Supplementary Table 9).
We also tested enrichment in RNA-binding protein (RBP) sites for SNPs associated with splicing QTLs. Similar to histone marks, meta-analysis SNPs were significantly enriched for RBP sites, but SMR-associated SNPs were not (Supplementary Table 9).
The lack of significance for histone marks and RBP site enrichments in SMR-associated SNPs may reflect low statistical power, because the total input SNPs were far fewer for those associated with SMR compared to the numbers in meta-analyses (Supplementary Table 9).
Similar to target genes of eQTLs, the target spliced genes were not always the closest in physical distance to the candidate functional SNPs, even though alternative splicing mediated by risk alleles is usually within 100 kb of the candidate functional SNPs (Takata et al., 2017;Raj et al., 2018). In six of nine sQTL loci, namely, 7p11.2, 11q23.3, 16p13.3, 16q12.1, 20q13.33, and 22q13.1, the target genes were not the nearest genes to the SMR-associated SNPs (Figures 2A-D and Supplementary  Figures 3C,E). In 20q13.33, four SNPs regulated the alternative splicing of RTEL-TNFSF6B, but only one (rs3208007, exonsynonymous) was located within the target gene (Table 4). rs1295810 was situated within the 3 UTR of ARFRP1 and rs4809328 within an intron of ZGPAT, which were telomeric to RTEL1-TNFRSF6B. The 4th SNP rs2150910 was within  the 3 UTR of STMN3, which was centromeric to RTEL1-TNFRSF6B ( Figure 2D). In LIME1, another target gene that was alternatively spliced in 20q13,33, the associated SNP rs6122154 resided within an intron of ZGPAT, which was centromeric to LIME1 ( Figure 2D). In 11q23.3, rs11216924, rs73023341, and rs61900957 were the three SNPs associated with alternative splicing of TMEM25 (Table 2). However, rs11216924 and rs73023341 were both intronic SNPs within ARCN1, whereas rs61900957 was an intronic SNP within PHLDB1 (Figure 2A and Table 4). Both ARCN1 and PHLDB1 are genes telomeric to the target gene TMEM25. In 16p13.3, rs34316274 was localized within an intron of LMF1 but mediated alternative splicing of the nearby lncRNA RP11-161M6.2 as well as LMF1 itself (Figure 2B and Table 4). Similarly, within 16q12.1, the closest gene to rs2058815 was CNEP1R1; nevertheless, it affected alternative splicing of HEATR3, which was further telomeric to CNEP1R1 (Figure 2C and Table 4). In 22q13.1, rs6000943 is an intronic SNP within MICALL1; however, its effect was on the splicing of C22orf23, which was the target gene telomeric to it (Supplementary Figure 3E and Table 4). Lastly, rs2699247 in 7p11.2 affected the alternative transcription of SEC61G despite its location within an intron of the lncRNA RP11-745C15.2 (Supplementary Figure 3C and Table 4).

DISCUSSION
The finding of this study is the first to show that glioma GWAS risk alleles may mediate their effect through alternatively spliced transcripts. Two loci, 1p44 and 16q13.3, only harbored sQTLs, and an additional seven loci had coexisting eQTL and sQTLs. Furthermore, there were more SMR-associated sQTLs than eQTLs (32 sQTLs versus 15 eQTLs), which suggested that alternative splicing may be an important molecular mechanism in gliomagenesis mediated by GWAS risk alleles. Target genes were different between eQTL and sQTL for many of the loci with both types of QTL, which further illustrated the complexity of functional regulation of these risk loci (Gusev et al., 2019).
The most common type of alternative RNA splicing in the brain is skipped exon (Vuong et al., 2016;Reble et al., 2018), and the brain ranked the highest among 16 human tissues in terms of the proportion of genes with skipped exons (Yeo et al., 2004). This study found that 11 of the 14 known annotated sQTL involved skipped exons, which is concordant with what is known about the alternative splicing mechanism of the brain to date. Our finding also suggested that many of the sQTL SNPs associated with spliced RNA were within binding sites for RNA-binding proteins (RBPs), thus raising the possibility that these variants may alter the ability of RBPs to bind and interact with pre-mRNA and other RBPs within a spliceosome (Fu and Ares, 2014). Moreover, glioma risk variants may affect splicing through the process of trimethylation of H3K36 (Leung et al., 2019), and H3K36me3 is the second most common epigenomic mark overlapping with sQTL but not eQTL SNPs in this study. Methylation of H3K36 in the gene body (introns) has recently been revealed to be an important facilitator of spliceosome assembly, through its ability to recruit various adaptor proteins to support RNA splicing (Teissandier and Bourc'his, 2017;Leung et al., 2019). Furthermore, H3K36me3 has been associated with exon skipping (Shindo et al., 2013). Thus, glioma risk alleles may promote exon skipping through interference of the H3K36 trimethylation process and spliceosome assembly (Monteuuis et al., 2019). There was no overlap between SMR-associated SNPs of eQTLs and sQTLs; moreover, with the exception of 2p33.3 (C2orf80) and 16q12.1 (HEATR3), the target genes were otherwise different for the remaining five loci with coexisting eQTLs and sQTLs. This suggests that sQTLs may act independent of eQTLs in mediating gliomagenesis. If quantitative trait loci mapping did not include splicing evaluation, a total of 12 of 26 (46.2%) target genes would have been missed, including two loci (1q44 and 16p13.3) which exclusively harbored sQTLs and no eQTLs.
Of all the target genes involved in sQTLs, none had previously known molecular mechanism mediated by alternative RNA splicing in glioma, although a number of the genes had been shown to be involved in glioma pathogenesis, progression, or prognosis. For example, AKT3 (1q44) promotes glioma progression and represents a key resistance factor (Turner et al., 2015). SEC61G is a proto-oncogene required for glioblastoma cell survival (Lu et al., 2009). DDX6 is involved in radioand chemoresistance in glioblastoma (Cho et al., 2016), and TNFRSF6B suppresses CD95 ligand-induced apoptosis and chemotaxis in malignant glioma (Roth et al., 2001). The functional roles of the rest of spliced target genes have yet to be discovered. Of interest, RTEL1 and the read-through transcript RTEL1-TNFRSF6B had been postulated to be target genes in 20q13.33 due to its role in telomere maintenance and also the fact that the top glioma GWAS risk allele resides within an intron of RTEL1 (rs2297440) (Melin et al., 2017). However, no prior evaluation provided evidence that it is a significant eQTL target gene, and its biological role in the promotion and progression of glioblastoma has yet to be elucidated. This study found that RTEL1-TNFRSF6B is an sQTL but not an eQTL target gene, and three of the four SMR-associated sQTL SNPs were mapped outside of RTEL1-TNFRSF6B. Similarly, PHLDB1 has long been speculated as the target gene in 11q23.3 due to the location of the top GWAS SNP which is within an intron of the gene (rs12803321) and its role in modulating AKT phosphorylation (Zhou et al., 2010), but this study found alternative splicing of the PHLDB1 transcript to be the mechanism mediated by germline SNPs.
Among SMR-associated eQTL loci, 7p11.2, 11q23.3, 20q13.33, and 22q13.1 have coexisting sQTLs, but their target genes were different, suggesting that the regulation on gliomagenesis by these loci is more complex than previously realized. Among eQTL target genes, BCL9L (11q23.3), SCAPER (15q24.2), RP11-541N10.3 (10q24.33) CDKN2B-AS1 (9p21.3), and C2orf80 (2q33.3) were newly discovered, replicated, and integrated with GWAS findings, and they have not been previously reported in eQTL analyses using non-diseased brain tissues. In 11q23.3, which is a locus associated with IDH1-mutated glioma, the hypothesized target gene had been PHLDB1. However, our finding showed that the gene is BCL9L, which is a transcription regulator associated with WNT signaling in glioma (Lee et al., 2016;Gay et al., 2019). In 15q24.2, the target gene SCAPER transcribes into a cyclin A-interacting protein which regulates cell cycle progression, but its role in gliomagenesis had not been previously evaluated (Tsang et al., 2007). RP11-541N10.3 in 10q24.33 is a long non-coding RNA (lncRNA) with unknown function in glioma development. Within 2q33.3, which is also a locus associated with IDH1-mutated glioma (Kinnersley et al., 2018), the target gene C2orf80 is 50 kb telomeric to IDH1, but whether or how C2orf80 interacts with IDH1 remains to be elucidated. In 9p21.3, a recent transcriptome-wide association study (TWAS) in glioma only found CDKN2B but not its antisense transcript as candidate causal genes (Atkins et al., 2019). To our knowledge, this study is the first to report CDKN2B-AS1 (ANRIL), as well as CDKN2B as significant potential target genes in 9p21.3.
Common to other QTL mapping studies, a limitation of this study is the context of gene expression, which is limited to that of adult normal brain. If the manifestation of sQTL or eQTL is dynamic and only occurred during a certain developmental stage or early in gliomagenesis, this study is not set up to discover them. However, the advantage of using adult normal brain tissues is the ability to procure them through autopsy, and the relative confidence of isolating the influence of SNPs on gene expressions without the confounding effects of somatic alterations. We also acknowledged that bulk RNA-seq data consisted of a mixture of neurons and glial cells, but the singlecell eQTL dataset is uncommon and has less discovery power (6.9-fold differences) than similar-sized bulk RNA-seq QTL datasets (van der Wijst et al., 2018). Therefore, we aimed to leverage QTL meta-analyses using bulk RNA-seq for maximum power. Last, a limitation of using cell lines in ChIP-seq analysis is that it requires a large number of cells (>10 5 cells), and the analysis focuses on average peak calling without accounting for heterogeneity between cells. However, the single-cell ChIP-seq dataset is rarely available, and the abundance of bulk ChIPseq data from related cell types may allow for comparisons of functional elements.
In summary, this study identified alternative RNA splicing as a potential mechanism that may provide additional explanations for the functional basis of nine glioma risk loci. This study also showed that functional variants may influence total transcript abundance as well as spliced isoforms, and the target genes for eQTL and sQTL may differ in loci with both types of QTL. Finally, this meta-analysis identified comprehensively target genes that may serve as a reference for future functional assays.